An Important Issue in Data Mining : Data Cleaning

نویسندگان

  • Qi Xiao Yang
  • Sam Yuan Sung
  • Chun Lu
  • Jay Rajasekera
چکیده

School of Computing National University of Singapore 3 Science Drive 2, Singapore 117543 { s sung, luchun} @comp. nus. edu. s g tel: (65)8746148 Qi Xiao Yang Institute of High Performance of Computing 89B Science Park Drive#0105/08 the Rutherford Singapore 118261 [email protected] tel: (65)7709265 Jay Rajasekera Graduate School of International Management International University of Japan jrr@iuj .ac .jp tel: (81) 257791531 An Important Issue in Data Mining-Data Cleaning

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handling Missing Values in Data Mining

Missing Values and its problems are very common in the data cleaning process. Several methods have been proposed so as to process missing data in datasets and avoid problems caused by it. This paper discusses various problems caused by missing values and different ways in which one can deal with them. Missing data is a familiar and unavoidable problem in large datasets and is widely discussed i...

متن کامل

Using well defined tokens in similarity function for record matching in data cleaning techniques

The integration of information is an important area of research in databases. The duplicate elimination problem of detecting database records that are approximate duplicates, but not exact duplicates, which describe the same real world entity, is an important data cleaning problem. To ensure high data quality, data warehouse must cleanse data by detecting and eliminating the redundant data. Dur...

متن کامل

An Efficient Algorithm for Data Cleaning of Web Logs with Spider Navigation Removal

The World Wide Web is growing massively larger with the exponential growth of websites providing the user with heaps of information. Text files called as web logs are used to store the clicks of a user whenever a user visits a website. Web usage mining is a stream of web mining that involves the applications of mining techniques to be applied on the server logs containing the user clickstreams....

متن کامل

A Unified Framework and Sequential Data Cleaning Approach for a Data Warehouse

The data cleaning is the process of identifying and removing the errors in the data warehouse. Data cleaning is very important in data mining process. Most of the organizations are in the need of quality data. The quality of the data needs to be improved in the data warehouse before the mining process. The framework available for data cleaning offers the fundamental services for data cleaning s...

متن کامل

Declarative XML Data Cleaning with XClean

Data cleaning is the process of correcting anomalies in a data source, that may for instance be due to typographical errors, or duplicate representations of an entity. It is a crucial task in customer relationship management, data mining, and data integration. With the growing amount of XML data, approaches to effectively and efficiently clean XML are needed, an issue not addressed by existing ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002